Skip to content

GH-49452: [Python] Reintroduce docstring injection for stubfiles#49453

Open
rok wants to merge 7 commits intoapache:mainfrom
rok:docstring-injection
Open

GH-49452: [Python] Reintroduce docstring injection for stubfiles#49453
rok wants to merge 7 commits intoapache:mainfrom
rok:docstring-injection

Conversation

@rok
Copy link
Member

@rok rok commented Mar 4, 2026

Rationale for this change

Warning: should not be merged before #49259.
See #49452 and #48618

What changes are included in this PR?

Adds a wheel build time script to populate stubfiles with runtime docstrings.

Are these changes tested?

Not yet.

Are there any user-facing changes?

Users will get docstrings.

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

⚠️ GitHub issue #49452 has been automatically assigned in GitHub to PR creator.

@rok rok force-pushed the docstring-injection branch from 095ee4c to 2056fec Compare March 9, 2026 17:37
@rok
Copy link
Member Author

rok commented Mar 9, 2026

@raulcd this is ready for review

@rok rok requested review from assignUser, jonkeane and kou as code owners March 10, 2026 00:34
@rok rok force-pushed the docstring-injection branch from 2f0f841 to f885111 Compare March 10, 2026 00:57
@raulcd
Copy link
Member

raulcd commented Mar 10, 2026

@github-actions crossbow submit wheel*-cp313-*

@github-actions
Copy link

Revision: f885111

Submitted crossbow builds: ursacomputing/crossbow @ actions-c1678cd8dd

Task Status
wheel-macos-monterey-cp313-cp313-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313-arm64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-arm64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rok the new check on the ci/scripts/python_wheel_validate_contents.py doesn't seem to work :(

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Mar 10, 2026
@rok rok force-pushed the docstring-injection branch from d638b95 to 10509c4 Compare March 10, 2026 12:49
@rok
Copy link
Member Author

rok commented Mar 10, 2026

@github-actions crossbow submit wheel*-cp313-*

@github-actions
Copy link

Revision: 10509c4

Submitted crossbow builds: ursacomputing/crossbow @ actions-f3f679ce77

Task Status
wheel-macos-monterey-cp313-cp313-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313-arm64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-arm64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions

@rok rok force-pushed the docstring-injection branch from 10509c4 to e4903eb Compare March 10, 2026 13:18
@rok
Copy link
Member Author

rok commented Mar 10, 2026

@github-actions crossbow submit wheel*-cp313-*

@github-actions
Copy link

Revision: e4903eb

Submitted crossbow builds: ursacomputing/crossbow @ actions-a33b20ecd0

Task Status
wheel-macos-monterey-cp313-cp313-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313-arm64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-arm64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions

@rok
Copy link
Member Author

rok commented Mar 10, 2026

@raulcd the docstring presence check was apparently too strict. I think this is ready for review now.

@rok
Copy link
Member Author

rok commented Mar 12, 2026

@raulcd any we could get this reviewed by end of week?

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some documentation about the process?
The CMake option and the stub generation (update_stub_docstrings.py) mainly.

Right now I am not entirely sure I understand the process for update_stub_docstrings.py.

Populate pyarrow_pkg with source Python modules and installed binary artifacts
so that pyarrow can be imported from the parent directory of pyarrow_pkg.
"""
ext_suffix = sysconfig.get_config_var("EXT_SUFFIX") or ".so"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this EXT_SUFFIX env var necessary? doesn't seem to be used anywhere apart from this line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used below as:

is_extension = ext_suffix in artifact.name or artifact.suffix == ".pyd"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant whether it can be something different than the default "*.so". I'm not sure is necessary to be able to override as an env variable, what's the use case?

Copy link
Member Author

@rok rok Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per 🤖 :
sysconfig.get_config_var("EXT_SUFFIX") returns the platform-specific extension suffix (e.g. .cpython-313-x86_64-linux-gnu.so on Linux, .cpython-313-darwin.so on macOS, .pyd on Windows). We need it to correctly identify compiled extension modules vs other files in the install directory — a plain ".so" check would miss the full CPython-tagged suffixes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just try copying everything but .pyi files as this really is unnecessarily complicaated. Sorry.

if(DEFINED SKBUILD_STATE
AND SKBUILD_STATE STREQUAL "wheel"
AND NOT CMAKE_SYSTEM_NAME STREQUAL "Emscripten"
AND DEFINED ENV{CI}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly this will default to ON on wheels and only on CI, otherwise is off and we have to turn it on manually, i.e. testing locally?

Is this logic necessary? Should we just remove all that, add the option and enable it as part of the wheel build scripts by setting an env var? We can add PYARROW_REQUIRE_STUB_DOCSTRINGS as a possible option on our pyproject.toml:
PYARROW_REQUIRE_STUB_DOCSTRINGS = {env = "PYARROW_REQUIRE_STUB_DOCSTRINGS", default = "OFF"} similar to PYARROW_BUNDLE_ARROW_CPP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me try that. Do we want to enable PYARROW_REQUIRE_STUB_DOCSTRINGS at dev time?

def _create_importable_pyarrow(pyarrow_pkg, source_dir, install_pyarrow_dir):
"""
stubs_dir, build_lib = Path(stubs_dir), Path(build_lib)
Populate pyarrow_pkg with source Python modules and installed binary artifacts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Populate pyarrow_pkg mean? I am not sure I understand what is pyarrow_pkg here. Is it the source tree? Are we copying the Arrow shared objects around to populate the stubs? Otherwise I am unsure what are we copying/linking. Isn't this what PYARROW_BUNDLE_ARROW_CPP does? Maybe we have to build after CMake runs the copy for PYARROW_BUNDLE_ARROW_CPP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that pyarrow_pkg is a folder where we can import pyarrow from, so we can grab runtime docstrings from it. Updating comment to make it clearer.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 12, 2026
Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 12, 2026
@rok
Copy link
Member Author

rok commented Mar 12, 2026

Thanks for the review @raulcd, I've pushed some changes, can you check if this now makes sense?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 12, 2026
@rok
Copy link
Member Author

rok commented Mar 12, 2026

@kou could you do a pass here too? Especially feedback on CMake changes would be valuable :).

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 12, 2026
@rok rok force-pushed the docstring-injection branch from 5576c51 to e7e51db Compare March 12, 2026 21:45
@rok rok force-pushed the docstring-injection branch from fab8f56 to a1b43d8 Compare March 12, 2026 22:36
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Reintroduce docstring injection for stubfiles

3 participants